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1  Introduction 


We  consider  the  problem  of  minimizing  an  objective  function  f  :  W  U  subject  to  bound  con¬ 
straints,  i.e. 

minimize  f{x) 

subject  to  X  €:  [a,  &], 

where  a,x,b  6  and  we  write  x  6  [a, 6]  to  denote  04  <  Xi  <  6i  for  i  =  We  are 

concerned  with  special  cases  of  Problem  (1)  for  which  evaluation  of  the  objective  function  involves 
performing  one  (or  more)  comphcated,  deterministic  computer  simulation(s).  Many  such  problems 
arise  as  engineering  design  problems  and  are  often  distinguished  by  two  troubling  characteristics 
that  preclude  solution  by  traditional  algorithms  for  bound-constrained  optimization. 

First,  the  output  of  a  complicated  computer  simulation  is  usually  affected  by  a  great  many 
approximation,  rounding  and  truncation  errors.  These  errors  are  not  stochastic — ^repeating  the 
simulation  will  reproduce  them— but  their  accumulation  introduces  high-frequency,  low-amplitude 
distortions  of  the  idealized  objective  that  we  would  have  liked  to  optimize.  In  consequence,  opti¬ 
mization  algorithms  that  compute  or  apprcBdmate  (by  finite  differencing)  derivatives  of  /  often  fail 
to  exploit  general  trends  in  the  objective  function  and  become  trapped  in  local  minimizers  created 
by  high-frequency  oscillations.  In  order  to  develop  effective  al^rithms  for  such  applications,  we 
restrict  attention  to  derivative-free  methods  for  numerical  optimization. 

Second,  complicated  computer  simulations  are  often  expensive  to  perform.  Prank  (1995)  sug¬ 
gested  that  one  must  address  problems  in  which  a  typical  function  evaluation  costs  several  hours  of 
supercomputer  time.  For  example,  Booker  (1996)  and  Booker  et  al.  (1996)  studied  an  aeroelastic 
and  dynamic  response  simulation  of  a  helicopter  rotor  blade  for  which  a  single  function  evaluation 
requires  approximately  six  hours  of  cpu  time  on  a  Cray  Y-MP.  We  formalize  the  notion  that  the 
objective  function  is  expensive  to  evaluate  by  imposing  an  upper  bound  V  on  the  number  of  eval¬ 
uations  of  /  that  we  are  allowed  to  perform.  The  severity  of  this  restriction  will  depend  (in  part) 
on  the  relation  between  V  and  p. 

When  attempting  to  minimize  an  objective  function  /  that  is  too  expensive  for  standard  numer¬ 
ical  optimization  algorithms  to  succeed,  it  has  long  been  a  standard  engineering  practice,  described 
by  Barthelemy  and  Haftka  (1993),  to  replace  /  with  an  inexpensive  surrogate  /  and  minimize  / 
instead.  (For  example,  one  might  evaluate  /  at  V  —  1  carefully  selected  sites,  construct  /  from 
the  resulting  information,  use  a  standard  numerical  optunization  algorithm  to  minimize  /,  and 
evaluate  /  at  the  candidate  minimizer  thus  obtained.)  This  practice  may  also  have  the  salutory 
effect  of  smoothing  high-frequency  oscillations  in  /.  The  rapidly  growing  literature  on  computer 
experiments  offers  new  and  potentially  better  ways  of  unplementing  this  traditional  practice.  The 
prescription  that  seems  to  be  gaining  some  currency  in  the  engineering  community  was  proposed  by 
Welch  and  Sacks  (1991);  following  current  convention,  we  refer  to  it  as  DACE  (Design  and  Analysis 
of  Computer  Experiments).  Frank  (1995)  offered  an  optimizer’s  perspective  on  this  methodology, 
suggested  that  the  “minimalist  approach”  of  minimizing  a  single  /  is  not  likely  to  yield  satisfactory 
results,  and  proposed  several  sequential  modeling  strategies  as  alternatives.  Booker  (1996)  studied 
several  industrial  applications  of  DACE  and  two  alternative  approaches. 

It  is  not  our  purpose  in  this  report  to  provide  a  thorough  critique  of  DACE  as  a  method  for 
minimizing  expensive  objective  functions.  We  regard  DACE  and  traditional  iterative  methods  for 
numerical  optimization  as  occupying  opposing  ends  of  a  spectrum.  When  V  is  large  relative  to 
p,  say  p  =  2  and  V  =  500,  then  the  expense  of  function  evaluation  is  not  an  issue  and  we  are 
content  to  rely  on  traditional  iterative  methods.  When  V  is  not  large  relative  to  p,  say  p  =  2 
and  V  =  5,  then  the  expense  of  function  evaluation  is  completely  crippling  and  we  are  content  to 
rely  on  DACE.  (If  V  <  p,  then  the  methodologies  that  we  consider  are  not  appropriate.)  In  this 
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report  we  are  concerned  with  intermediate  situations  and  we  borrow  ideas  from  both  the  numerical 
optimization  and  the  computer  experiment  literatures.  We  describe  a  sequential  modeling  strategy 
in  which  computer  experiment  models  are  used  to  guide  a  grid  search  for  a  minimizer.  Our  methods 
elaborate  and  extend  an  important  special  case  of  the  general  model  management  strategy  proposed 
by  Dennis  and  Torczon  (1996)  and  developed  by  Serafini  (1997).  These  efforts  are  part  of  a  larger 
collaboration  described  by  Booker  et  al.  (1995)  and  Booker  et  al.  (1996). 

2  Pattern  Search  Methods 

We  require  a  method  of  solving  Problem  (1)  that  does  not  require  derivative  information.  For 
unconstrained  optimization,  a  popular  derivative-free  method  is  the  simplex  method  proposed  by 
Nelder  and  Mead  (1965).  This  method  is  sometimes  adapted  for  constrained  optimization  by  means 
of  a  simple  ad  hoc  device,  viz.  setting  f{x)  =  oo  when  x  ^  [a^b].  Unfortunately,  the  Nelder-Mead 
simplex  method  is  suspect  even  for  unconstrained  optimization.  For  example,  McKinnon  (1996) 
has  constructed  a  family  of  strictly  convex,  differentiable  objective  functions  on  for  which  there 
exist  starting  points  from  which  Nelder-Mead  will  fail  to  converge  to  a  stationary  point.  Instead, 
we  rely  on  a  class  of  methods  for  which  a  convergence  theory  exists,  the  paMem  search  methods 
explicated  by  Torczon  (1997)  for  the  case  of  unconstrained  optimization  and  extended  by  Lewis 
and  Torczon  (1996a)  to  the  case  of  bound-constrained  optimization. 

Pattern  search  methods  are  iterative  algorithms  for  numerical  optimization.  Such  algorithms 
produce  a  sequence  of  points,  {xfc},  from  an  initial  point,  Xo,  provided  by  the  user.  To  specify  an 
algorithm,  one  must  specify  how  it  progresses  from  the  current  iterate,  Xc^  to  the  subsequent  iterate, 
x^.  One  of  the  distinguishing  features  of  pattern  search  methods  is  that  they  restrict  their  search 
for  x^  to  a  grid  (more  formally,  a  lattice)  that  contains  Xc*  The  grid  is  modified  as  optimization 
progresses,  according  to  rules  that  ensure  convergence  to  a  stationary  point. 

The  essential  logic  of  a  pattern  search  is  summarized  in  Figure  1.  The  reader  is  advised  that  this 
description  of  pattern  search  methods  differs  from  the  presentation  in  Torczon  (1997)  and  Lewis 
and  Torczon  (1996a,  1996b).  For  example,  the  choice  of  a  starting  point  is  usually  not  restricted 
and  the  initial  grid  is  constructed  so  that  it  contains  the  starting  point.  More  significantly,  pattern 
search  methods  are  usually  specified  by  rules  that  prescribe  where  the  algorithm  is  to  search  for 
the  subsequent  iterate  and  the  notion  of  an  underlying  grid  is  implicit  in  thase  rules.  In  this  report, 
the  grid  is  explicit  and  the  search  for  a  subsequent  iterate  is  not  restricted  to  a  specific  pattern  of 
points.  What  should  be  appreciated  is  that  the  present  description  preserves  all  of  the  elements 
of  pattern  search  methods  required  by  their  convergence  theory.  A  detailed  explication  of  the 
connection  between  these  perspectives  will  be  provided  by  Serafini  (1997). 

It  is  evident  that  the  crucial  elements  of  a  pattern  search  algorithm  are  contained  in  the  speci¬ 
fication  of  2(b)  in  Figure  1.  The  fundamental  idea  is  to  try  to  find  a  point  on  the  current  grid  that 
strictly  decreases  the  current  value  of  the  objective  function.  Any  such  point  can  be  taken  to  be 
the  subsequent  iterate.  If  one  fails  to  find  such  a  point,  then  one  replaces  the  current  grid  with  a 
finer  one  and  tries  again. 

Torczon  (1997)  described  the  search  in  2(b)(i)  as  an  exploratory  moves  algorithm.  Here  we 
distinguish  two  components  of  an  exploratory  moves  algorithm:  an  oracle  that  produces  a  set  of 
trial  points  on  the  current  grid  and  a  core  pattern  of  trial  points  on  the  grid  at  which  the  objective 
function  must  be  evaluated  before  the  algorithm  is  permitted  to  refine  the  grid.  The  convergence 
theory  requires  that  the  core  pattern  satisfy  certain  hypotheses;  no  hypotheses  are  placed  on  the 
oracle. 

Because  the  methods  proposed  in  this  report  critically  depend  on  the  arbitrary  nature  of  the  or- 
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1.  Specify  the  current  grid.  Select  xq  from  the  current  grid.  Let  Xc  =  xq. 

2.  Do  until  convergence: 

(a)  Let  T(+)  =  0. 

(b)  Do  while  T(+)  =  0: 

i.  Search  the  current  grid  for  a  set  of  Xt  e  [a,  b]  at  which  /  is  then  evaluated.  Let  T(+) 
denote  the  set  of  grid  points  Xt  €  [a,h]  thus  obtained  for  which  f{xt)  <  f{xc)^ 

ii.  Update  the  grid. 

(c)  Choose  E  T(+). 

(d)  Let  Xc== 


Figure  1:  Pattern  search  methods  for  numerical  optimization. 


acle,  we  emphasize  that  any  method  wha,tsoever  can  be  employed  to  produce  points  that  potentially 
decrease  the  current  value  of  the  objective.  We  might  perform  an  exhaustive  search  of  the  current 
grid  or  we  might  specify  a  complicated  pattern  of  points  at  which  to  search.  We  might  appeal  to 
our  prior  knowledge  of,  or  our  intuition  about,  the  objective  function.  It  does  not  matter  the 
convergence  theory  for  pattern  search  methods  encompasses  all  such  possibilities. 

For  the  sake  of  clarity,  we  describe  more  fully  a  specific  pattern  search  algorithm.  First,  we 
construct  the  grids  on  which  the  searches  will  be  conducted.  For  n  ~  0, 1, 2, . . .,  we  define  vector 
lattices  r(n)  restricted  to  the  feasible  set  [a,  b]  as  follows:  x  E  r(n)  if  and  only  if  for  each  i  =  1, . . . ,  p 
there  exists  E  {0, 1, . . .  ,  2^}  such  that 

ji 

Xi  =  ai  +  —  ai) . 

Thus,  r(0)  comprises  the  vertices  of  [a,  6]  and  r(n+l)  is  obtained  from  r(n)  by  halving  the  distance 
between  adjacent  grid  points  (see  below).  When  we  update  the  current  grid,  say  r(n),  in  2(b)(ii), 
we  either  retain  r(n)  or  replace  r(n)  with  r(n  +  1). 

Next  we  specify  a  core  pattern.  Given  Xc  €  [a,  6],  we  say  that  Xt  6  [a,  6]  is  adjacent  to  Xc  if  and 
only  if  there  exists  6  {1, . . .  ,p}  such  that 

xtk  =  Xck  o.k) 

and  xti  =  Xci  for  i  ^  fc.  We  take  as  the  core  pattern  the  set  of  grid  points  adjacent  to  the  current 
iterate.  (For  example,  if  the  current  grid  is  the  integer  lattice  on  3?^  restricted  to  [a  =  (0,0)',  6  = 
(8, 8)'],  then  the  core  pattern  of  (2, 0)'  comprises  (3, 0)',  (2, 1)',  and  (1, 0)'.)  We  refine  the  grid,  i.e. 
we  replace  r(n)  with  r(n  +  1),  if  and  only  if  we  have  evaluated  /  at  each  grid  point  xt  adjacent  to 
Xc  and  failed  to  find  f{xt)  <  f{xc)- 

If  /  is  continuously  differentiable,  then  the  theory  developed  by  Lewis  and  Torczon  (1996a) 
guarantees  that  the  specified  algorithm  will  produce  a  sequence  {xk}  that  converges  to  a  Karush- 
Kuhn-Tucker  point  of  Problem  (1).  In  practice,  of  course,  the  algorithm  must  terminate  in  a  finite 
number  of  steps.  Termination  criteria  for  pattern  search  methods — indeed,  for  direct  search  meth¬ 
ods  in  general — have  not  been  studied  extensively,  but  that  does  not  concern  us  here.  By  definition, 
the  assumption  that  Problem  (1)  is  expensive  means  that  we  cannot  afford  enough  evaluations  of 
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the  objective  function  to  terminate  by  the  traditional  criteria  of  numerical  optimization.  For  the 
problems  that  we  consider,  the  relevant  termination  criterion  is  whether  or  not  we  have  exhausted 
the  permitted  number  (F)  of  function  evaluations. 

Derivative-free  methods  for  numerical  optimization  can  be  quite  profligate  with  respect  to  the 
number  of  function  evaluations  that  they  require.  Because  the  number  of  function  evaluations 
available  to  us  is  severely  limited,  we  want  to  use  these  evaluations  as  efficiently  as  possible.  On 
the  bright  side,  the  convergence  theory  for  pattern  search  methods  allows  us  to  replace  Xc  with  any 
xt  for  which  f{xt)  <  f{xc)-  Hence,  no  matter  how  comprehensive  a  search  for  trial  points  we  may 
have  envisioned,  we  can  abort  it  as  soon  as  we  find  a  single  trial  point  that  satisfies  f{xt)  <  fixc)- 
On  the  dark  side,  the  oracle  may  require  a  great  many  function  evaluations  to  produce  even  one 
Xt  for  which  f{xt)  <  f(xc).  Furthermore,  if  the  oracle  is  unsuccessful,  then  we  can  not  refine  the 
current  grid  until  after  /  has  been  evaluated  at  each  grid  point  adjacent  to  Xc — a  process  that  may 
require  as  many  as  2p  additional  function  evaluations  if  Xc  is  an  interior  grid  point  and  /  has  not 
yet  been  evaluated  at  any  of  the  grid  points  adjacent  to  Xc- 

The  fundamental  purpose  of  this  report  is  to  introduce  methods  that  reduce  the  expense  of  the 
oracle.  We  do  not  attempt  to  reduce  the  number  of  points  in  the  core  pattern.  For  unconstrained 
optimization,  Lewis  and  Torczon  (1996b)  have  introduced  core  patterns  that  contain  only  p  +  1 
points. 

Pattern  search  algorithms  may  intentionally  use  large  numbers  of  function  evaluations.  For 
example,  the  oracle  employed  by  Dennis’s  and  Torczon’s  (1991)  parallel  direct  search  intentionally 
casts  a  wide  net,  relying  on  parallel  computation  to  defray  the  expense  of  evaluating  the  objective 
function  at  a  great  many  grid  points.  We  are  concerned  with  problems  for  which  huge  numbers  of 
function  evaluations  are  not  possible.  Here,  we  want  an  oracle  that  proposes  promising  trial  points 
using  only  a  small  number  of  fimction  values.  Our  strategy  for  creating  such  an  oracle  will  be  to 
use  previous  function  values  to  construct  a  current  global  model,  fc,  of  /,  then  use  fc  to  predict 
trial  points  xt  at  which  f{xt)  <  fixc)-  Thus,  we  will  employ  the  strategy  described  in  Section  1, 
not  once  to  replace  Problem  (1),  but  repeatedly  to  guide  us  to  a  solution  of  it.  We  now  turn  to  a 
description  of  the  models  that  our  oracles  will  exploit. 


3  Computer  Experiment  Models 

Suppose  that  we  have  observed  yt  =  f{xi)  for  i  =  1, . . .  ,n.  On  the  basis  of  this  information,  we 
want  to  construct  a  global  model  /  of  /.  Such  inexpensive  surrogates  for  /  will  be  used  by  the  oracle 
in  the  pattern  search  algorithm  to  identify  promising  trial  points  at  which  to  compute  additional 
function  values. 

We  assume  that  there  is  no  uncertainty  in  the  yt  =  f{xi),  i.e.  that  no  stochastic  mechanism  is 
involved  in  evaluating  the  objective  function.  It  is  then  reasonable  to  require  the  surrogate  function 
to  interpolate  these  values,  i.e.  to  require  that  f{xi)  =  f{xi).  Rirthermore,  we  desire  families  of 
surrogate  functions  that  are  rich  enough  to  model  a  variety  of  complicated  objective  functions.  We 
are  led  to  consider  certain  families  of  models  that  have  been  studied  in  the  spatial  statistics  and 
computer  experiment  literatures. 

The  families  of  models  that  we  consider  are  usually  motivated  by  supposing  that  /  is  a  reahzation 
of  some  (nice)  stochastic  process.  For  certain  geostatistical  applications,  this  supposition  may  be 
quite  plausible.  In  the  context  of  using  computer  simulations  to  facilitate  the  engineering  of  better 
product  designs,  its  plausibility  is  less  clear.  The  high-freqiiency,  low-amplitude  oscillations  that 
we  have  described  do  resemble  the  realization  of  a  stochastic  process,  but  the  general  trends  that 
are  our  primary  concern  do  not.  In  any  case,  we  regard  the  supposition  of  an  underlying  stochastic 
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process  as  nothing  more  than  a  convenient  fiction.  The  value  of  this  fiction  lies  in  its  power  to 
suggest  plausible  ways  of  constructing  useful  models  and  we  will  try  to  avoid  invoking  it  excessively. 
When  we  do  invoke  it,  it  should  be  appreciated  that  optimality  criteria  such  as  BLUP  and  MLE 
(see  below)  are  defined  with  respect  to  the  fictional  stochastic  process  and  should  not  be  invested 
with  more  importance  than  the  practical  utility  of  the  models  to  which  they  lead. 

Our  requirement  that  f{xi)  =  f{xi)  will  immediately  suggest  spline  interpolation  to  the  ap¬ 
proximation  theorist  and  kriging  to  the  geostatistician.  In  fact,  as  explicated  by  Watson  (1984)  and 
others,  these  two  well-known  methodologies  are  formally  equivalent.  Their  motivations,  however, 
are  somewhat  different:  whereas  the  goal  of  the  former  is  to  interpolate  the  /  (xi)  as  smoothly  as 
possible,  the  goal  of  the  latter  is  to  approximate  /  as  accurately  as  possible.  It  is  evident  that  the 
kriging  perspective  is  more  germane  to  our  present  concerns.  The  remainder  of  this  section  briefly 
summarizes  some  relevant  facts  about  kriging  in  the  context  of  computer  experiments.  See  Sacks, 
Welch,  Mitchell  and  Wynn  (1989),  Booker  (1994),  and  Koehler  and  Owen  (1996)  for  comprehensive 
surveys  of  computer  experiment  methodology. 

We  begin  by  assuming  that  /  is  a  realization  of  a  stochastic  process  F  that  is  indexed  by  the 
continuous  parameter  set  3?^’.  We  assume  that  this  process  has  known  mean  fi(x)  =  0  and  known 
covariance  function  c(-,  •),  and  that  each  symmetric  pxp  matrix  c(s,t)  is  strictly  positive  definite. 
Let 


y  = 


f{Xn) 


and  for  each  a;  6  define  h{x)  e  3?”  to  minimize  E[y'b—  F{x)]^.  Then  f{x)  =  y'b{x)  is  the  best 
linear  unbiased  predictor  (BLUP)  of  /(a:)  and  it  is  well-known  that 

f{x)  =  y'C  ^c{x),  (2) 

where  C  is  the  symmetric  positive  definite  n  x  n  matrix  [c(a;i,  Xj)]  and 


c{xi,x) 


c(.t)  = 


This  is  a  simple  example  of  kriging.  Notice  that  kriging  necessarily  interpolates:  since  C  =  I 
and  c{xj)  is  column  j  of  C, 

f{xj)  =  i/C  ^c{xj)  =y'ej  =  yj  =  f{xj). 


Thus  far  we  have  assumed  that  the  stochastic  process  is  known.  We  now  suppose  that  F  is  a 
Gaussian  process  with  mean  p{x)  =  a{x)'P  and  covariance  function  c{s,t)  =  a'^re{s,t).  We  assume 
that  a  :  3?*’  ^  3R^  is  a  known  function,  that  e  is  an  unknown  vector,  that  >  0  is  an 
unknown  scalar,  and  that  re{-,  ■)  is  an  rmknown  element  of  a  known  family  of  correlation  functions. 

Next  let  A  denote  the  n  x  q  matrix  [aj(xi)],  let  R{d)  denote  the  symmetric  nx  n  matrix 
[r0{xi,Xj)],  and  let 

r  rg{xi,x)  1 


r{x\  &) 


rs{Xn,x)  J 


Then,  for  /3  and  9  fixed,  the  BLUP  of  f{x)  is  (cf.  equation  (2)) 


/»  =  a{x)'P  +  {y-  A0)'R{9)  h(x-,  9). 


(3) 
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Thus,  by  varying  /?  and  0,  we  define  a  family  of  interpolating  functions  from  which  we  can  select  a 
specific  /  to  model  /. 

Given  a  family  of  interpolating  functions  defined  by  (3),  we  require  a  sensible  way  of  specifying 

and  thereby  /,  For  9  fixed,  the  maximum  likelihood  estimates  (MLEs)  of  jS  and  have 
explicit  formulas: 

0{d)  =  [A'i?(e)  ^  A'R{8) 

and 

o\e)  =  \[y-  '  [y  -  ^${9)] . 

To  compute  0,  the  MLE  of  0,  it  turns  out  that  one  must  minimize 

n  log  (0)  +  log  det  R{0 :  '  I : 


as  a  function  of  0. 

It  is  now  evident  that,  in  specifying  a  famUy  of  correlation  functions,  there  is  a  potential  tradeoff 
between  the  richness  of  the  family  defined  by  (3)  and  the  ease  of  maximizing  (4)  ,  The  richer  the 
family  of  models,  the  more  difficult  it  may  be  to  select  a  plausible  member  of  it*  Most  of  the 
previous  research  on  computer  experiments  has  been  concerned  with  deriving  a  single  model  / 
that  will  be  used  as  a  permanent  surrogate  for  /,  Understandably,  the  authors  have  used  rich 
families  with  rather  complicated  correlation  functions  for  which  0  is  a  vector  of  dimension  p  or 
greater*  This  makes  maximizing  (4)  difficult,  but  0  need  only  be  computed  once.  In  contrast,  we 
are  concerned  with  deriving  a  sequence  of  models  that  will  be  used  for  the  sole  purpose  of  guiding 
our  optimization  of  /.  Hence,  we  are  content  to  sacrifice  some  flexibility  in  (3)  in  order  to  simplify 
minimizing  (4).  In  the  numerical  experiments  that  we  report  in  Section  5,  we  used  the  1-parameter 
isotropic  correlation  function  defined  by 

=exp(-e||s-<||^),  (5) 

where  ||  »  ||  denotes  the  Euclidean  norm  on 


4  Model- Assisted  Grid  Search 

The  optimization  strategies  developed  in  this  report  are  all  predicated  on  a  simple  idea,  viz.  that 
providing  the  oracle  in  Section  2  with  an  inexpensive  surrogate  model  of  the  objective  function 
will  allow  it  to  more  efficiently  identify  promising  trial  points  and  thereby  reduce  the  cost  of 
optimization.  The  surrogate  models  will  be  constructed  from  known  function  values  by  kriging, 
as  described  in  Section  3.  In  this  section,  we  describe  the  interplay  between  the  pattern  search 
algorithm  and  the  kriging  models  in  greater  detail. 

We  begin  with  an  instructive  analogy.  For  unconstrained  minimization  of  smooth  objective 
functions,  the  numerical  algorithms  of  choice  are  the  quasi-Newton  methods.  An  elementary  expo¬ 
sition  of  these  methods  was  provided  by  Dennis  and  Schnabel  (1983),  who  emphasized  the  following 
interpretation;  a  quasi-Newton  algorithm  constructs  a  quadratic  model  fc  of  the  objective  function 
/  at  the  current  iterate  Xc  and  uses  fc  to  identify  a  trial  point  xt  at  which  it  is  predicted  that 
f{xt)  <  f{xc).  For  example,  trust  region  methods  obtain  xt  by  (approximately)  minimizing  fc 
subject  to  the  constraint  that  xt  be  within  a  specified  neighborhood  of  Xc-  The  rationale  for  the 
constraint  is  that  the  model  fc,  which  is  usually  constructed  by  computing  or  approximating  the 
second-order  Taylor  polynomial  of  /  at  Xc,  can  only  be  trusted  to  approximate  /  locally. 
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1.  Specify  an  initial  grid,  To,  on  [a,  6]. 

2.  Perform  an  initial  computer  experiment: 


(a)  Select  initial  design  sites  xi,...  ,xn  G  Pq- 

(b)  Compute /(xi), ...  ,/(a;jv). 

(c)  Construct  fo  by  kriging. 

3.  Let  Pc  =  Fq.  Let  Xc  =  argmin(/(a;i), . . .  ,/(.Xiv)).  Let  fc  =  /o  and  Evab  =  {3:1, . . .  ,xn}- 

4.  Do  until  convergence: 

(a)  Do  until  Core(.'rc)  C  Evalg: 

i.  Apply  an  optimization  algorithm  to  fc  to  obtain  Xt  G  \  Evalg. 

ii.  Compute  f{xt).  Let  Evalc  =  EvaL  U  {xt}.  Update  fc- 

iii.  If  f{xt)  <  f{xc),  then  let  Xc  =  Xt- 

(b)  Refine  Fc. 

Figure  2:  Model-Assisted  Grid  Search  (MAGS)  for  minimizing  an  expensive  objective  function. 


Tbust  region  methods  make  effective  use  of  simple  local  models  of  the  objective  function.  Be¬ 
cause  we  are  concerned  with  situations  in  which  evaluation  of  the  objective  function  is  prohibitively 
expensive,  we  are  prepared  to  invest  more  resources  in  constructing  and  optimizing  more  compli¬ 
cated  global  models  of  the  objective  function. 

Similarly,  classical  response  surface  methodology,  from  Box  and  Wilson  (1951)  to  Myers  and 
Montgomery  (1995),  constructs  local  linear  or  quadratic  regression  models  of  a  stochastic  quadratic 
objective  function.  Again,  the  purpose  of  these  models  is  to  guide  the  search  for  a  minimizer  or 
maximizer.  Glad  and  Goldstein  (1977)  also  exploited  quadratic  regression  models  for  optimization, 
as  did  Elster  and  Neumaier  (1995)  to  guide  a  grid  search.  Recently,  nonparametric  response  surface 
methods  have  been  proposed  in  which  global  models  of  more  complicated  objective  functions  are 
constructed.  This  work,  e.g.  Haaland  (1996),  is  closely  related  to  ours.  Prank  (1995)  surveyed 
numerous  issues  that  arise  when  using  global  models  of  expensive  objective  functions  to  facilitate 
optimization. 

The  remainder  of  this  section  develops  a  specific  methodology  for  using  global  models  to  mini¬ 
mize  expensive  objective  functions.  The  essential  logic  of  the  methods  that  we  propose  is  summa¬ 
rized  in  Figure  2.  For  simphcity,  we  assume  that  the  expense  of  all  operations  performed  on  the 
model(s)  is  negligible  in  comparison  to  the  expense  of  function  evaluation (s).  Dennis  and  Torczon 
(1996)  have  proposed  a  general  model  management  strategy  that  can  be  employed  when  it  is  nec¬ 
essary  to  balance  these  expenses.  This  model  management  strategy  is  currently  being  developed 
and  extended  by  Serafini  (1997)  and  is  capable  of  accommodating  our  methods  as  a  special  case. 

Techniques  for  designing  the  initial  computer  experiment  in  step  2  do  not  concern  us  here, 
except  for  two  details.  First,  it  is  convenient  to  employ  a  technique  that  permits  the  initial  design 
sites  to  be  selected  from  the  initial  grid.  Several  such  techniques  were  described  by  Koehler  and 
Owen  (1996).  Second,  we  must  specify  iV,  the  number  of  initial  design  sites.  A  great  deal  of 
numerical  experimentation  will  be  required  before  it  becomes  possible  to  suggest  useful  guidelines 
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for  the  choice  of  N.  Because  we  are  concerned  with  problems  for  which  sequential  optimization  is 
practicable,  considerably  less  than  the  entire  budget  of  functions  evaluations  (V^)  will  be  invested 
in  initial  design,  i.e.  we  will  choose  iV  <  F.  At  present,  we  prefer  to  choose  N  >p  + I,  thereby 
permitting  at  least  a  simplex  design. 

Because  evaluation  of  the  objective  function  is  expensive,  it  seems  sensible  to  cache  the  function 
values  that  have  been  computed.  We  denote  the  current  set  of  points  at  which  /  has  been  evaluated 
by  Evalc,  which  in  step  3  we  initialize  to  comprise  the  initial  design  sites.  For  any  point  x  €  Fc,  we 
denote  by  Core(a;)  the  core  pattern  of  points  adjacent  to  x.  Then,  in  loop  4(a),  Core(.Tc)  C  Evalc 
is  precisely  the  condition  required  by  the  convergence  theory  for  pattern  search  methods  that  must 
be  satisfied  before  the  current  grid  can  be  refined.  Notice  that  this  condition  must  eventually 
obtain,  i.e.  loop  4(a)  cannot  repeat  indefinitely,  because  Fc  is  finite  and  we  only  consider  trial 
points  Xt  €  Fc  at  which  /  has  not  yet  been  evaluated.  Thus,  the  logic  of  loop  4  is  to  (a)  search  for 
points  of  improvement  on  the  current  grid  until  (b)  we  replace  the  current  grid  with  a  finer  grid. 
Loop  4  repeats  until  we  have  exhausted  our  budget  of  objective  function  evaluations. 

It  is  within  loop  4(a)  that  we  exploit  the  models  obtained  by  kriging.  Step  4(a)  (i)  produces  a 
trial  point  xt  at  which  /  is  evaluated  in  the  hope  that  f{xt)  <  f{xc).  As  soon  as  f{xt)  has  been 
computed,  we  add  xt  to  the  set  Evalc  and  update  the  current  model  fc-  It  is  not  entirely  clear 
how  best  to  update  fc.  One  could  completely  refit  the  family  of  models  to  the  new  set  of  function 
values,  but  one  might  decline  to  re-estimate  certain  parameters  if  one  believed  that  the  value  of 
updating  them  did  not  justify  the  expense. 

The  generality  of  the  convergence  theory  for  pattern  search  methods  permits  us  to  be  rather 
vague  in  specifying  step  4(a)  (i).  This  ambiguity  is  desirable  because  it  encompasses  a  great  many 
possibilities  deserving  consideration.  Thus,  we  can  search  for  a  local  minimizer  of  fc  using  whatever 
algorithm  we  prefer,  e.g.  quasi-Newton,  steepest  descent,  pattern  search,  etc.  We  can  start  our 
search  from  whatever  point  we  prefer  or  we  can  use  multiple  starting  points  and  pursue  multiple 
searches  before  determining  Xt-  We  can  even  search  for  a  global  minimizer  of  fy  if  we  are  so  inclined. 
Furthermore,  we  can  terminate  our  search  whenever  we  please.  (We  envision  searching  until  we  near 
a  local  minimizer  of  fc,  but  we  can  terminate  sooner  if  the  search  becomes  expensive.  Tbade-olfe 
between  the  cost  of  evaluating  the  objective  function  and  the  cost  of  constructing  and  optimizing 
the  models  can  be  mediated  by  the  model  management  strategy  proposed  by  Dennis  and  Torczon 
(1996)  and  developed  by  Serafim  (1997).)  The  only  requirement  is  that  we  eventually  return  a  trial 
point  Xt  that  belongs  to  the  current  grid  and  at  which  /  has  not  yet  been  evaluated. 

Most  search  strategies  for  a  desirable  xt  will  not  restrict  attention  to  the  current  grid.  Because 
we  require  Xt  €  Fc,  we  suggest  terminating  the  search  when  step  lengths  become  appreciably  smaller 
than  the  minimal  distance  between  grid  points  and  choosing  Xt  to  be  a  grid  point  that  is  near  the 
terminal  iterate  of  the  search.  There  are  various  plausible  ways  of  implementing  this  suggestion. 
We  might  prefer  the  nearest  grid  point  or  we  might  prefer  a  nearby  grid  point  at  which  the  model 
predicts  greater  decrease.  Because  we  require  Xt  ^  Evalc,  we  might  not  actually  select  the  nominally 
preferred  grid  point.  SimUarly,  if  we  are  impatient  to  refine  the  grid,  we  might  select  xt  6  Core(irc) 
even  when  a  nonadjacent  point  is  nominally  preferred. 

5  Numerical  Experiments 

We  now  describe  some  numerical  experiments  intended  to  illustrate  the  viability  of  the  methods 
proposed  in  Section  4.  These  experiments  were  performed  on  a  Toshiba  Satellite  lOOCS  notebook 
computer  with  a  75MHz  Pentium  processor,  using  S-PLUS  for  Windows  software.  In  what  follows, 
univariate  optimization  (for  parameter  estimation)  was  performed  using  the  S-PLUS  function  opti- 
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Minimize! 

Frequency 

(0,-10) 

3 

38 

(-6,-4) 

30 

44 

(18,2) 

84 

8 

(12,8) 

840 

10 

Table  1:  Minima  of  the  rescaled  Goldstein-Price  objective  function  on  [-20,20]  X  [-20,20]  found 
by  100  quasi-Newton  searches. 

mize,  an  implementation  of  Brent’s  (1973)  safeguarded  polynomial  interpolation  procedure.  When 
finite-difference  quasi-Newton  methods  for  multivariate  optimization  were  employed  (typically  for 
minimizing  models),  they  were  performed  using  the  S-PLUS  function  nlminb,  an  implementation 
of  a  trust-region  method  for  bound-constrained  optimization  developed  by  Gay  (1983,  1984). 

The  objective  function  used  in  our  experiments  was  a  rescaled  version  of  an  eighth-order  poly¬ 
nomial  of  p  =  2  variables  constructed  by  Goldstein  and  Price  (1971).  One  of  the  test  problems 
employed  by  Dixon  and  Szego  (1978)  in  their  study  of  global  optimization  was  to  minimize  the 
Goldstein-Price  polynomial  in  the  feasible  region  [-2, 2]  x  [-2, 2].  We  rescaled  this  problem  so  that 
the  feasible  region  was  [—20,20]  X  [—20,20]. 

The  Goldstein-Price  polynomial  is  a  comphcated  function  that  has  four  minimizers  and  is  not 
easily  modelled.  Because  it  is  a  polynomial,  the  minimizers  are  easily  located  by  a  finite-difference 
quasi-Newton  method.  To  estimate  the  relative  sizes  of  their  basins  of  attraction,  we  drew  100 
points  from  a  uniform  distribution  on  the  feasible  region  and  started  nlminb  from  each  point.  The 
results  are  summarized  in  Table  1. 

The  models  that  we  employed  aie  a  special  case  of  the  family  specified  by  (3).  We  set  g  =  1,  so 
that  is  a  scalar  parameter,  and  a{x)  =  1.  We  used  the  correlation  function  specified  by  (5)  and 
estimated  the  scalar  parameter  6  by  applying  optimize  to  (4). 

We  implemented  two  strategies  for  derivative-free  optimization  with  a  small  value  of  V ,  the 
permitted  number  of  objective  function  evaluations.  First,  we  implemented  a  DACE  strategy  with 
V  =  11,16,21,26.  For  these  procedures,  we  constructed  a  model,  /,  from  N  =  V  —  I  design 
sites  obtained  by  Latin  hypercube  sampling.  We  then  minimized  /  by  a  finite-difference  quasi- 
Newton  method,  starting  from  the  initial  design  site  with  the  smallest  objective  function  value. 
The  objective  function  was  then  evaluated  at  the  minimizer  of  /  so  obtained  and  the  smallest  of 
the  V  function  values  was  recorded. 

Second,  we  implemented  the  MAGS  strategy  described  in  Section  4  with  F  =  11, 16.  For  these 
procedures,  it  is  first  necessary  to  specify  an  initial  grid.  This  involves  a  trade-off:  coarse  grids  tend 
to  safeguard  against  unlucky  initial  designs,  whereas  fine  grids  permit  more  function  evaluations 
before  it  becomes  necessary  to  evaluate  /  at  a  core  pattern  of  points  in  order  to  refine  the  grid.  Our 
choice  of  a  family  of  grids  was  further  comphcated  by  the  fact  that  the  minimizers  of  our  objective 
function  occur  on  the  integer  lattice  in  3?^.  To  avoid  rising  grids  that  contain  the  minimizers,  we 
selected  an  initial  grid  of  the  form 

Xi  =  -20  +  jin/2, 

for  ji  =  0, 1,2, . . ..  Subsequent  grids,  which  were  rarely  required,  were  obtained  by  halving  the 
current  distance  between  adjacent  grid  points.  To  obtain  an  initial  design  on  the  initial  grid,  we 
first  obtained  N  =  5  points  by  Latin  hypercube  samphng,  then  moved  each  point  to  the  nearest 
grid  point.  The  initial  model  was  constructed  from  this  design. 
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Percentile 

DACEll 

DACE16 

DACE21 

DACE26 

MAGSll 

MAGS16 

Minimum 

3.4i 

5.29 

5.02 

3.26 

5.77 

3.43 

5% 

9.60 

11.99 

10.64 

9.89 

5.77 

5.77 

10% 

15.63 

13.74 

17.16 

14.42 

6.78 

5.77 

25% 

37.26 

30.83 

26.04 

24.75 

28.98 

6.78 

50% 

116.75 

71.61 

51.69 

41.70 

43.89 

30.48 

75% 

367.95 

122.35 

99.96 

86.33 

87.78 

64.55 

90% 

601.86 

240.92 

190.71 

154.20 

343.21 

101.63 

95% 

951.55 

423.13 

230.89 

281.84 

529.02 

135.22 

Maximum 

1027.52 

1243.90 

774.91 

349.43 

1617.61 

885.32 

Table  2:  Percentiles  of  smallest  values  of  the  Goldstein-Price  objective  function  found  by  six  opti¬ 
mization  strategies,  each  replicated  100  times. 


To  obtain  a  trial  point,  xt,  from  the  current  iterate,  Xo  a  finite-difference  quasi-Newton  method 
was  started  from  Xc  to  obtain  a  minimizer,  x,  of  the  current  model,  fc.  Let  x  denote  the  grid 
point  nearest  x.  If  the  objective  function  had  not  previously  been  evaluated  at  x,  then  we  set 
Xf  =  x]  otherwise,  we  set  Xi  equal  to  the  point  in  Core($)  nearest  x.  Each  time  that  a  new  function 
value  was  obtained,  we  re-estimated  all  three  of  the  model  parameters,  This  process  was 

repeated  until  V  function  evaluations  had  been  performed,  at  which  time  the  smallest  function 
value  was  recorded. 

Each  of  the  six  procedures  described  above  was  replicated  100  times.  The  results  are  summarized 
in  Table  2,  from  which  several  interesting  conclusions  can  be  drawn.  First,  we  note  that  each 
strategy  did  indeed  do  better  when  permitted  more  function  evaluations.  Thus,  DACE  with  P  =  16 
function  evaluations  usually  outperformed  DACE  with  P  =  11  function  evaluations  and  MAGS  with 
P  =  16  function  evaluations  usually  outperformed  MAGS  with  P  =  11  function  evaluations.  This 
result  is  not  surprising. 

Second,  we  note  that  MAGS  typically  found  smaller  function  values  with  fewer  function  eval¬ 
uations  than  did  DACE.  The  crucial  comparisons  are  between  DACE  and  MAGS  with  P  =  11 
function  evaluations  and  between  DACE  and  MAGS  with  P  =  16  function  evaluations.  In  each 
case,  typical  performance  decisively  favors  MAGS.  For  example,  the  median  best  function  value 
found  by  MAGS  11  is  less  than  40  percent  of  the  median  best  value  found  by  DACEll.  Similarly, 
the  median  best  function  value  found  by  MAGS  16  is  less  than  45  percent  of  the  median  best  value 
found  by  DACE  16. 

The  DACE  experiments  with  P  =  21, 26  were  included  to  benchmark  the  performance  of  MAGS. 
Except  for  the  extreme  upper  percentiles,  the  distribution  of  best  function  values  found  by  MAGS 
with  P  =  11  is  strikingly  similar  to  the  distribution  of  best  function  values  found  by  DACE  with 
P  =  26  function  evaluations.  Thus,  another  way  to  state  the  case  for  MAGS  is  to  observe  that 
DACE  typically  required  more  than  twice  as  many  function  evaluations  to  match  the  performance 
of  MAGSll.  Such  observations  confirm  the  central  thesis  of  this  report,  that  optimization  is  most 
eflficiently  accomplished  by  iterative  methods.  This  is  a  familiar  issue  in  statistics  that  is  often 
raised  when  comparing  Taguchi  and  response  surface  methods,  e.g.  in  the  panel  discussion  edited 
by  Nair  (1992)  and  by  Trosset  (1996).  Our  results  strengthen  the  cases  made  by  Prank  (1995), 
Booker  et  al.  (1995),  Booker  et  al.  (1996)  and  Booker  (1996)  for  using  sequential  modehng  strategies 
to  optimize  expensive  functions. 

We  note  that  despite  its  generally  superior  performance,  MAGS  occasionally  disappoints.  For 
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example,  the  lower  75  percent  of  the  distributions  of  best  values  found  by  DACE  with  V  =  26  and 
by  MAGS  with  F  =  11  are  remarkably  similar,  but  the  former’s  upper  10  percent  is  markedly  better 
than  the  latter’s.  MAGS  with  V  =  16  found  a  smallest  function  value  less  than  or  equal  to  142.70 
on  97  occasions,  but  found  smallest  values  of  688.75,  855.64,  and  885.32  on  the  other  three.  This 
anomalous  phenomenon  seems  to  result  from  unfortunate  initial  designs.  With  only  AT  —  5  design 
sites,  Latin  hypercube  sampling  on  a  grid  occasionally  results  in  transparently  bad  designs,  e.g.  five 
collineaj  sites,  from  which  MAGS  has  difficulty  recovering.  To  safeguard  against  this  possibility, 
we  recommend  using  more  sophisticated  procedures  for  determining  the  initial  design. 


6  Conclusions  and  Future  Work 


The  results  presented  in  Section  5  suggest  that  the  potential  value  of  the  methods  that  we  have 
proposed  for  optimizing  expensive  objective  functions  is  considerable.  Of  course,  comprehensive 
numerical  experiments  on  a  variety  of  objective  functions  in  a  variety  of  dimensions  will  be  required 
to  fully  appreciate  the  advantages  and  disadvantages  of  these  methods.  We  plan  to  undertake  such 
investigations  in  future  work.  We  conclude  this  report  by  describing  some  modifications  of  MAGS 
that  we  envision  for  these  investigations. 

We  begin  by  noting  that  the  design  of  the  current  model,  /c,  is  sequential,  comprising  the  initial 
design  sites  xi,...,xn  and  the  trial  sites  xt  subsequently  identified  in  4(a)  (i)  in  Figure  2.  At  present, 
the  initial  design  sites  are  selected  according  to  the  principles  of  experimental  design  without  regard 
to  optimization,  whereas  the  subsequent  trial  sites  are  determined  by  an  optimization  algorithm 
without  regard  to  the  design  of  experiments.  Because  the  sequence  of  trial  points  generated  by  an 
optimization  algorithm  tends  to  cluster  in  promising  regions,  this  sequence  is  rarely  space-filling 
and  is  not  likely  to  be  a  good  experimental  design.  Moreover,  especially  during  the  early  stages  of 
optimization,  greater  gains  are  likely  to  come  from  improving  the  fit  of  to  f  than  from  accurately 
identifying  a  minimizer  of  fc-  Our  present  implementation  of  MAGS  is  a  greedy  algorithm  in  the 
sense  that  it  tries  to  minimize  fc  without  concern  for  the  fit  of  the  subsequent  model,  /+. 

The  desirability  of  balancing  the  concerns  of  numerical  optimization  and  the  concerns  of  ex¬ 
perimental  design  was  recognized  by  Prank  (1995),  who  proposed  a  dichotomous  search  strategy. 
Given  an  optimization  criterion,  e.g.  the  objective  function,  and  a  design  criterion,  one  obtains  some 
fraction  of  new  design  sites  using  one  criterion  and  the  balance  using  the  other .  An  implementation 
of  this  Balanced  Local-Global  Search  (BLGS)  strategy  was  described  by  Booker  et  al.  (1995)  and 
employed  by  Booker  (1996)  and  Booker  et  al.  (1996). 

In  contrast  to  the  dichotomous  BLGS  strategy,  we  envision  modifying  4(a)  (i)  as  follows:  apply 
an  optimization  algorithm  to  a  merit  function,  to  obtain  Xt  £  Fc  \  Eval^.  The  merit  function 
should  be  specified  so  as  to  balance  the  potentially  competing  goals  of  finding  sites  at  which  fc  is 
small  and  choosing  good  design  sites.  Notice  that  this  modification  in  no  way  affects  the  convergence 
theory  for  MAGS — it  is  a  purely  empirical  matter  whether  or  not  it  improves  the  performance  of 
the  algorithm. 

To  illustrate  what  we  have  in  mind,  we  again  invoke  the  fiction  that  the  objective  function,  / , 
is  a  realization  of  a  stochastic  process.  Under  the  assumptions  dehneated  in  Section  3,  the  mean 
squared  error  of  (3)  for  predicting  / (x)  is 


MSE  [fix)]  = 


^1  -  [a(.'E)',r(.T;0)'] 


0  A' 
A  R{e) 


As  detailed  by  Sacks,  Welch,  Mitchell  and  Wynn  (1989),  this  function  plays  a  fundamental  role 
in  several  approaches  to  optimal  design.  It  is  also  the  design  criterion  employed  by  Booker  et  al. 
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(1995)  and  Booker  (1996)  in  BLGS.  Our  idea  is  to  construct  a  merit  function  that  encourages  the 
selection  of  trial  points  at  which  the  mean  squared  error  is  large,  i.e.  at  which  we  beheve  that 
less  is  known  about  the  objective  function.  This  strategy  has  a  great  deal  in  common  with  the 
Sequential  Design  for  Optimization  (SDO)  algorithm  for  global  optimization  proposed  by  Cox  and 
John  (1996). 

The  mean  squared  error  function  can  be  estimated  by  replacing  the  parameters  (/3,  (t‘^,9)  with 
their  MLEs,  resulting  in  an  expression  that  we  denote  by 

l^E  [fix)]  . 

Then  a  natural  family  of  merit  functions  comprises  those  of  the  form 

^c{x)  =  fcix)-pME[Ux)], 

where  Pc  >  0.  We  anticipate  that  a  great  deal  of  numerical  experimentation  will  be  required  to 
determine  useful  choices  of  the  pc- 

We  also  note  that  our  family  of  kriging  models  was  derived  under  the  assumption  of  a  stationary 
stochastic  process.  In  fact,  it  seems  rather  implausible  that  the  same  correlations  will  obtain 
throughout  the  feasible  set,  so  that  we  likely  will  prefer  different  values  of  9  for  different  regions  of 
[a,  6].  This  poses  a  problem,  because  the  fc  are  global  models.  As  we  descend  into  the  basin  of  a 
local  minimizer,  we  would  like  to  use  a  value  of  9  that  is  suited  to  that  basin,  not  a  value  of  9  that 
was  determined  by  a  global  compromise. 

An  obvious  way  of  addressing  this  difficulty  is  to  implement  the  "zoom-in”  process  proposed  by 

Prank  (1995).  To  do  so,  we  occasionally  modify  the  feasible  set,  restricting  it  to  a  much  smaller  set 
in  which  a  minimizer  is  deemed  to  lie.  Then  the  current  model,  fy,  is  determined  by  the  design  sites 
in,  and  restricted  in  domadn  to,  the  current  feasible  set,  [ucjM-  Prom  a  theoretical  perspective, 
this  is  a  delicate  strategy  that  must  be  carefully  implemented  in  order  to  preserve  the  guaranteed 
convergence  that  has  thus  far  been  inherited  from  the  standard  theory  for  pattern  search  methods. 
However,  properly  implemented,  updating  the  feasible  set  may  permit  a  useful  localization  of  the 
kriging  models. 

Assuming  that  we  are  prepared  to  replace  the  current  feasible  set  with  a  subset  of  it,  when 
should  we  do  so?  When  step  4(a)  (i)  begins  to  produce  trial  points  xt  G  Core(.'rc),  one  might  guess 
that  Xc  is  near  a  local  minimizer  and  that  the  algorithm  is  preparing  to  refine  the  current  grid. 
As  previously  noted,  this  can  be  an  expensive  undertaking,  requiring  as  many  as  2p  evaluations 
of  /.  At  this  stage,  an  interactive  user  might  very  well  instruct  the  algorithm  to  refine  the  grid 
vhthout  actually  evaluating  /  on  CoTe(xc),  reasoning  that  an  asymptotic  convergence  theory  may 
be  of  little  relevance  when  one  can  afford  only  a  small  number  of  iterations. 

Alternatively,  one  might  interpret  the  same  scenario  as  a  signal  to  recalibrate  the  problem. 
Pirst,  one  would  replace  the  current  feasible  set  with  a  smaller  one.  Second,  instead  of  performing 
2p  function  evaluations  on  Core(.'rc) — a  set  of  points  that  is  not  a  good  design — one  would  replace 
the  current  grid  with  a  finer  grid,  design  an  experiment  for  the  new  grid,  and  proceed.  At  present, 
there  is  no  theory  to  justify  these  tactics,  but  it  seems  quite  plausible  that  they  will  result  in 
practical  improvements  of  an  already-promising  procedure. 
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